Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bspline #39

Merged
merged 24 commits into from
Sep 14, 2023
Merged

Bspline #39

merged 24 commits into from
Sep 14, 2023

Conversation

BalzaniEdoardo
Copy link
Collaborator

Added BSplineBasis and CyclicBSplineBasis to the basis module

Copy link
Member

@billbrod billbrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments, largely on inheritance structure and documentation. Some bigger points:

  • in general, as mentioned in Docstring/signature and inheritance #30, having evaluate come from Basis and the subclasses implement _evaluate gets confusing, because the docstrings for the function we want users to call is generic. might be worth revisiting whether the super class has evaluate or _evaluate (which the subclasses call). The other way to do this would be to remove the eval_basis = self._evaluate(*xi) from Basis.evaluate() and have the subclasses call super().evaluate() to get the checks, but then I don't know whether we can then require users to implement evaluate (maybe we can, just by marking it as @abstract, but I'm not sure). Out of scope for this PR, but it's making me even more inclined to thinking we should change this. If you'd like, I can try it out while you're working on the GLM class (it shouldn't change the interface at all).
  • For all spline bases, does knot_locs need to be an attribute (or at least, a public one)? seems like _generate_knots is always called when it's needed. And we don't expect the user to use it, right? so probably shouldn't be an attribute
  • I don't love the tests (e.g., test_minimum_number_of_basis_required_is_matched for either BSpline) that have a long boolean expression as raise_exception and then a bunch of ORs in the match for pytest.raises -- it feels like we're losing the specificity of the test (and the readability), since I don't know which error is expected in which condition. the alternative would be to split them up into separate tests or to have an if block defining the match string. both are long, but I think the second is worth doing, since it seems like a decent tradeoff between readability and compactness. Thoughts?

src/neurostatslib/basis.py Show resolved Hide resolved
def _check_samples_non_empty(sample_pts):
if sample_pts.shape[0] == 0:
raise ValueError(
"Empty sample array provided. At least one sample is required for evaluation!"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here and only for BSpline?

from the tests, looks like you're expecting it when evaluate_on_grid(x) for x<=0, but is that really an issue? for the negative case, we'll get an error in np.linspace and for x=0, you'd get an empty array otherwise, which seems reasonable. Is there a specific reason to check for this?

and in general, don't like that we'll have different behavior for BSpline than for the rest, when really, it's the same concerns, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't check, with x=0 you get an "invalid input data" error message from the scipy splev call.

I am fine with that but:

  • it is a bit nested and less clear for a user what may have caused it.
  • MSpline, RaisedCosineLog, RaisedCosineLinear returns an empty array, not an error.
  • OrthExp raises another class specific error, which is that you need at least as many samples as basis function to define the basis (otherwise the ortonormalization would be not-defined).

Uniformity in behavior is hard to get given that different basis construction have different requirements.
My rationale here was try to surface the error and give a basis-type specific explanation of the error which is understandable.
If we try to get something that looks more uniform there are a few way:

  • return an empty array for B-spline too instead of calling splev in case of empty input
  • raise the error instead of returning an empty array for all of the basis function we have

Which one do you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I'm inclined to work towards uniformity in behavior, and I don't have a strong preference here, but I think raising the error for all of them is the right way? because if someone calls evaluate_on_grid(0), they've really misunderstood what it does. right?

src/neurostatslib/basis.py Outdated Show resolved Hide resolved
src/neurostatslib/basis.py Outdated Show resolved Hide resolved
src/neurostatslib/basis.py Outdated Show resolved Hide resolved
src/neurostatslib/basis.py Outdated Show resolved Hide resolved
src/neurostatslib/basis.py Outdated Show resolved Hide resolved
src/neurostatslib/basis.py Outdated Show resolved Hide resolved
src/neurostatslib/basis.py Outdated Show resolved Hide resolved
src/neurostatslib/basis.py Outdated Show resolved Hide resolved
@@ -94,6 +94,11 @@ def evaluate(self, *xi: NDArray) -> NDArray:
-------
:
The generated basis functions.
Raises
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there's supposed to be an empty line here

def _check_samples_non_empty(sample_pts):
if sample_pts.shape[0] == 0:
raise ValueError(
"Empty sample array provided. At least one sample is required for evaluation!"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I'm inclined to work towards uniformity in behavior, and I don't have a strong preference here, but I think raising the error for all of them is the right way? because if someone calls evaluate_on_grid(0), they've really misunderstood what it does. right?

Copy link
Member

@billbrod billbrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just remove the one commented-out line, and let's discuss the uniformity of behavior for evaluate_on_grid(0).

Like Guillaume mentioned, I think these tests are really hard to understand, but that seems out of scope here.

The evaluation is performed by looping over each element and using `splev`
from SciPy to compute the basis values.
"""
super()._check_samples_non_empty(sample_pts)
# super()._check_samples_non_empty(sample_pts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not using this, should remove this line

@BalzaniEdoardo BalzaniEdoardo merged commit f1194c3 into main Sep 14, 2023
2 checks passed
@BalzaniEdoardo BalzaniEdoardo deleted the bspline branch September 14, 2023 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants